Collaborating neuroscience online: The case of the Human Brain Project forum

This paper analyzes user interactions on the public-access online forum of the Human Brain Project (HBP), a major European Union-funded neuroscience research initiative, to understand the utility of the Forum for collaborative problem solving. We construct novel data using discussion forum posts and detailed user profiles on the HBP Forum. We find that HBP Forum utilization is comparable to that of a leading general-interest coding platform, and that online usage metrics quickly recovered after an initial Covid-19-related dip. Regression results show that user interactions on the Forum are more active for questions on programming and in HBP core areas. Further, Cox proportional hazard analyses show that such problems are solved faster. Forum posts with users from different countries tend to be discussed more actively but solved slower. Higher shares of administrator support tend to solve problems faster. There are no clear patterns regarding gender and seniority. Our results suggest that building novel collaborative forums can support researchers working on complex topics in challenging times.


Introduction
Neurological disorders are the leading cause of disability and the second leading cause of death worldwide, accounting for 9 million deaths (16.5% of total global deaths) and the loss of 276 million disability-adjusted life years in 2016 [1]. Most brain diseases have no cure, and many existing treatments are very expensive. Meanwhile, there is growing public and private investment in artificial intelligence (AI) for health care projects, with health care being the most invested sector by AI investors [2]. Although many life science areas, including neuroscience, are traditionally more laboratory-and experimentally-based, both Covid-19 and the massive global cost of neurological diseases heighten the need to harness digitization in upstream health care markets. Since 2013, there have been different brain science initiatives launched in Europe, US, Israel, Japan, and China [3] as well as an emerging international brain initiative [4], leading to a burgeoning "brain race" [5]. This paper studies the public-access online forum of one of the earliest brain initiatives, the Human Brain Project (HBP). Launched in 10/2013, the HBP is a flagship science initiative in the European Commission's Future and Emerging Technologies program and recipient of a 10-year, €1 billion grant. The HBP aims to advance brain research and improve treatments for neurological diseases by merging neuroscience with information and communication technology (ICT) including computational science, robotics, and artificial intelligence [6]. As of 2021, a total of 179 institutions from over 20 countries had participated in the HBP. As well as making grants to research partners, the HBP allocates resources to build digital platforms, including a public-access online HBP Forum. As a major public discussion channel of the HBP, the utilization and collaborative problem-solving on the HBP Forum offers a valuable case study of institutional design to facilitate scientific collaboration.
We construct a novel dataset to examine whether and to what extent the HBP Forum is actively used, what factors are tied with richer online user interactions, and whether the HBP Forum offers an effective platform for problem-solving. We collected data from public sources to capture all user interactions and discussion content on the HBP Forum as well as characteristics of forum user profiles (e.g., demographics, institutions, scientific areas). We categorize the discussion threads based on the nature of topics, and further identify whether, when, and by whom a question has been solved. Data reveal that the HBP Forum is well-utilized and remains resilient during Covid-19, reflected in both the extensive margin of usage and intensive margin of user interactions. With the novel data constructed, this paper offers the first systematic empirical analyses of the utilization and performance of the HBP Forum.
We employ regression analyses to investigate what factors are associated with richer user interactions measured by the numbers of user replies per post within a quarter. We analyze covariates related to the content of discussed topics, the technical aspects, and the demographic and institutional profile of users who post the initial questions and users who reply. We further create a content-based measure of whether and when each question raised is solved effectively. We define a post as solved effectively if the asking user confirms the proposed solution; when direct confirmation is not available, we label the solution status based on the content and co-users' confirmation. We then utilize Cox proportional hazard models to analyze the time taken to solve a posted problem and covariates that accelerate problem-solving. We find that questions closely tied with HBP platforms and questions on programming issues with a higher share of explicit code in communications generate more discussions, especially when participating users are geographically more diverse. Questions posted on the Forum are solved faster when HBP administrators participate, and when code snippets are shared. Richness of interaction and likelihood of solution appear to be independent of participating users' HBP affiliation status.
This paper contributes to two strands in the literature. First, our paper contributes to studies about knowledge-sharing platforms by studying a large-scale, multinational scientific forum. As research specialization increases, knowledge-sharing and collaboration become increasingly essential for knowledge creation [7,8], which further promotes diversity in the process [9,10]. Knowledge diffusion can be spurred by offline research institutions that further advance scientific discoveries [11,12]. With the rise of remote work, online discussion forums spur knowledge sharing and creation with evolving communities and flat hierarchies [13][14][15]. Prior studies have examined patterns and drivers of organizational sharing in sub-national and proprietary platforms [16][17][18][19]. Recent qualitative studies suggest that life science platforms can function well by pooling resources from interdisciplinary areas [20,21]. We further offer a quantitative study of an online life science forum backed by a supranational organization.
Second, studies on digitization in health care often focus on the adoption and utilization of health information technologies (HIT) among downstream users (e.g., health providers), and our study complements prior work by examining a research-oriented digital forum for upstream users (i.e., neuroscientists). Studies find HIT improve health outcomes mainly for complex conditions or specific populations [22][23][24][25], and that HIT can complement other programs in e.g. combating the opioid crisis [26]. However, the increases in costs arising from HIT are also substantial [27], although the cost burdens are less concerning in IT-intensive locations that provide complementary assets [28]. Given that brain-related diseases are mostly complex and effective treatments are rare, digitization can provide new channels to spur global research collaborations for treatments. Some studies hypothesize that digital platforms can help advance neurosciences [29,30], and our paper provides the first systematic analysis on how a digital forum is used by neuroscientists.
In addition, our study has policy implications for the design of online institutions for life sciences research. Both novel design and effective utilization of online institutions have become increasingly important given the disruption arising from Covid-19, which is reflected in the creation of the National Virtual Biotechnology Laboratory (NVBL, science.osti.gov/ nvbl) by the US Department of Energy (DOE) as a consortium of DOE national laboratories. The pre-Covid-19 experience of the HBP offers a setting to understand how to proactively build institutional capacity that remains resilient during a disruptive period such as a pandemic. While it is difficult to evaluate the long-term impact of such projects, our analyses of contemporary performance on the forum can help inform policy regarding certain aspects of online institutional designs.

The Human Brain Project (HBP) and the HBP Forum
The HBP was launched in 2013, finished a ramp-up phase in 2016, and entered three special grant agreement phases in 2016, 2018, and 2020 (Fig 1(a)). The HBP includes 12 sub-project areas, including six generic topic areas (i.e., mouse brain organization, human brain organization, systems and cognitive neuroscience, theoretical neuroscience, management and coordination, ethics and society) and six platform-related sub-projects (i.e., neuroinformatics, brain simulation, high-performance analytics and computing, medical informatics, neuromorphic computing, and neurorobotics). Fig 1(b) shows the structure of the platforms. Access to the HBP platforms is granted to applicants from partner institutions; other users can request an account but are evaluated on a case-by-case basis. Besides general areas, research teams build project-specific private repositories, and such data are only accessible to related team members. From 01/2018, the HBP began building EBRAINS on the new EU neuroscience supercomputing centers. The sub-platforms maintain the same focus and can perform better. The HBP platforms are hosted on a centralized access point: on HBP Collaboratory from 03/2016-09/2021, transitioning thereafter to EBRAINS as the new host, which remains current to date.
The HBP Forum was launched in July 2015 as an integral connecting part of the HBP platform infrastructure. The Forum serves as a public discussion website about HBP-related topics, including questions on HBP-related activities in general, neuroscience progress, and programming challenges. Serving as the HBP's "Stack Overflow", topics raised and discussed in the Forum are public and can be read without registering an account, but only users with an account in the Forum can reply to or comment on topics raised in the Forum. Since the Forum is designed for public discussion, anyone interested in participating can create an account. Users do not have to be HBP-affiliated to use the Forum, and users with an HBP account can use the same account for the Forum. Therefore, the HBP Forum is designed to facilitate informal collaboration and knowledge-sharing between researchers within and beyond the HBP community.
In the absence of detailed project-level data, user interactions in the public HBP Forum are the best available source to analyze HBP platform utilization and real outcomes of the Forum for the neuroscience research community. In addition, the HBP Forum retained the same structure and functionality during the transition of the HBP platform host starting in 09/2021, making the Forum a consistent measure of user activity. There were 534 posts on the Forum during 07/2015-03/2021, with 2,492 total replies and 550,175 total views. The collection and analysis method complied with the terms and conditions for the source of the data.

Post-level HBP Forum data construction
We retrieved the full text of all posted threads available on the public HBP Forum between 07/ 2015 and 03/2021 (last accessed on 03/31/2021), and cross-checked the relationship database. We processed the rich text data and extracted information on the topics discussed in each post, the timestamp and content of each post and reply, and the number of total views for each post. We then merged in user-level data (see section 2.3) to the post-level data to capture the nuances on who interacts with whom, and how this differs by type of post. Based on the postlevel data, we cleaned and organized the data at both the post-level including each complete thread of discussion following the initial question or topic posted and reply-level including each individual reply to a post. We have obtained approval and IRB exemption from the Data Protection Officer of the Max Planck Society and were confirmed that all data used are compliant with relevant sources and current regulation. During our sample period, a total of 534 posts were initiated by 208 of 283 total active users. We define active users are users who ever posted on the platforms, excluding registered users who never posted anything. On average, each post was viewed 1,030 times and received 3.7 replies. Across discussion topic categories, neurorobotics was the most popular, with 325 topics (60.8% of total) and 152 users (53.7% of total), followed by technical support (S1 Fig).
To capture nuances in HBP Forum discussions, we categorized each Forum post in two independent ways. First, we obtained the content-based sub-categories tagged by the Forum and grouped them into six major topic categories: neuromorphic, brain simulation/modeling, neurorobotics, technical support, organization, and others. Second, we analyzed all posts and manually categorized whether each post had a query to be solved, and if so, whether this query was solved by a HBP administrator/moderator or by users in the community. When multiple solutions were offered, we used the first solution timestamp to construct solving time. If an answer was provided first by a user and further clarified/confirmed by an administrator (2.5% of all posts), we classified it as administrator-solved. Posts that did not raise a question are classified as informational (i.e., were not question-oriented, and thus could not be solved). Two informational posts were re-categorized as questions, as users asked follow-up questions and were answered. Our results are robust to dropping or re-categorizing these two cases. Third, we recorded the timestamp when a query is solved. If an initial question was solved but inspired new questions and answers within the same post, we labelled such posts as multiquestion posts and recorded information for each sub-question. In addition, we assigned two indicators for each post to capture if code snippets are included, and whether the topic being discussed is specifically related to HBP platforms (i.e., not a generic question).
To understand user diversity, equity, and inclusion on the platform, we further assigned indicators for whether a given post was created by a female user, whether it contained code snippets, whether the initial posting user was affiliated with an HBP partner institution, whether the initial post was created by a senior user, and the country where the user is based according to his/her main employer. Before aggregating the data to post-and post-quarter levels, we calculated the share of users who are female, share of replies that contain code, share of users who are more senior, and number of users affiliated with an HBP partner. In this way, we construct the dataset not only using the content of questions, but also to understand the type of questions being discussed and how diverse the Forum community is.

User-level data construction
We further constructed a user-level database to capture details about who interacts with whom, and why. We combined information from multiple sources, including the HBP Forum, HBP PLUS (i.e., a user profile database maintained by the HBP for statistical reporting purposes for which users can opt in), HBP websites profiling key team members across project areas, and HBP YouTube channels with archived information on past team members. Specifically, we obtained the list of users, online usernames, real names, and institutional affiliations whenever publicly available for active users registered in the HBP Forum database. We used multiple matching algorithms to merge the Forum data with other sources based on details including full name, institutional affiliation, country of residence, contact details, gender, fields of experience, and highest level of education. We matched the users with the information we had collected from HBP promotional videos on YouTube and the internal HBP user database, HBP PLUS. The matching was based on the usernames and the real names using STATA's fuzzy matching algorithm "matchit" and the "merge" command. Each match was further verified manually. Where necessary, we supplemented these data with manual collection and disambiguation using Google Scholar searches, LinkedIn profiles, and institutional webpages. The dataset is anonymized and aggregated at the post-level.

Descriptive statistics
In each complete calendar year during our sample period, there are on average 98 posts and 366 replies posted in the Forum (Table 1   For the survival analyses, we focus on the initial question posted, excluding follow-up questions inspired within a given thread. We also exclude informational posts, as they pose no questions to be solved. These criteria result in our sample for the survival analyses with 465 observations. The survival sample has a similar distribution to the full sample regarding postlevel and user-level characteristics, including share of posts with code, posts related to the HBP platforms, share of posted questions solved, and fully identified users ( Table 1, Panel C).
There are 260 active users in our full sample, among which we fully identified 188 users (72%) regarding demographics, affiliation, and expertise. Unidentified users do not use their real names in their profile or in posts, and we do not have enough information to back up their identity. A large share of identified users is quite active, and most of them are affiliated with an HBP partner institution. At the user level (Table 1, Panel C), most active users of the HBP Forum are male (86%), slightly lower than the male rate on Stack Overflow during 2015-2020 (>90%). The share of female users (14%) is higher than that in Stack Overflow (i.e., 6% in Europe), but lower than the share of female neuroscientists worldwide (25%) (source: https:// insights.stackoverflow.com/survey/2020) [32]. About 81% of Forum users are voluntary users (i.e., not administrators). 74% of users are affiliated with an HBP partner institution and are members (but not leaders) of HBP sub-projects. 92% of the active users are students or academic employees (e.g., researchers and software engineers). Non-academic users comprise computer scientists employed in private companies or self-employed entrepreneurs. Seniority level is coded for all active users. Nine users changed seniority level during our sample period and are coded accordingly. Most Forum users are graduate students (39%). Junior researchers (post-doctoral scholars and assistant professors) make up 14% of users, while senior researchers (associate and full professors) make up 18%. Finally, 18% of active users are senior software engineers employed by a research institution. We group senior researchers and senior software engineers together in our analyses.
Geographically, active users are based in 27 countries, with the top six countries accounting for about 76% of the users: Germany (30%), Switzerland (21%), United Kingdom (8%), Italy (7%), United States (6%), and France (3%). On average, we observe users participated from 1.7 countries per post. The geographic diversity peaks in 2018 with 1.94 countries per post and is then stable throughout the observed window of time.

What drives richer online user interactions?
To examine the factors that drive richer online user interactions, we aggregated the data to the post-quarter level. The data structure is not a panel as we often have only one observation over time per post. The data set contains 599 observations of 534 posts, where 58 posts containing replies from more than a quarter (i.e., 3 months) later contribute to the number of observations exceeding posts. We perform regression analysis at the post-quarter level using the following equation: Here y it , is the number of replies for post i in quarter t. HBP platform i indicates whether post i is related to the HBP platforms. X it is a vector of post and user-level characteristics including geographic and gender diversity, posts including code snippets, and HBP partnership affiliation; we use shares instead of levels to account for the standardized user composition. δ t denotes year-fixed effects. Heteroskedasticity-robust standard errors are reported. Given that most post-level interactions happen within the first three months, we do not have within-post overtime variation to allow for post-level fixed effects. All variables are aggregated to the post-quarter level.  Table 2 reports the results from post-level analyses. Column (1) includes covariates related to post composition that vary at post-level. Column (2) comprises initial post characteristics. Column (3) combines the two sets of covariates. Column (4) further includes topic category fixed effects to account for differences in the underlying post topics. Throughout the specifications in columns (1)-(3) user interactions are significantly higher for posts related to the HBP platforms, and programming posts with code snippets. Posts with a higher share of code-embedded replies have on average more replies. In particular, the inclusion of code snippets in the initial question post is a strong predictor of more follow-up interactions, with the estimates positive and statistically significant in all specifications. This is consistent with prior studies on the importance of including code snippets in the initial question to receive more attention of fellow users, and thus a faster and more targeted solution [33,34]. Contrary to prior studies [35,36], we do not find statistically significant differences in user interaction patterns related to the gender of the user who raises a question. Similarly, whether a question is asked by a user with advanced access to the HBP platform or by a more senior user does not significantly alter the interaction on the Forum. These results likely partly reflect the usage of gender-neutral usernames and the lack of a direct information tag on user's HBP affiliation status or experience on their profile and posts. This design feature of the Forum is worth further investigation in future studies of online institution building. In contrast, Forum administrators are clearly marked on the profiles and a higher share of replies from administrators (i.e., more institutional support provided) associates with lower intensity of voluntary interactions. Further, greater geographic diversity in participating users is associated with significantly more replies to a post within a quarter, and this effect is stronger when controlling for both the post-level and the initial post characteristics in column (3).

PLOS ONE
To account for differences between the underlying content discussed in each post, we further control for fixed effects at the content category level (column (4)). In this more demanding specification, the HBP platform i variable is no longer present due to collinearity with the category fixed effects. After controlling for this content-level measure, more diverse countrylevel user participation (i.e., at the extensive margin) remains statistically significant and positively associated with active user interactions (i.e., at the intensive margin). Overall, we observe similar patterns in the main estimates. There is a higher level of user interactions for posts asking questions related to the HBP platforms, those including code snippets in the initial posts and in the replies, and those with geographically-diverse users.

What factors make problem solving faster on the HBP Forum?
To further examine the factors that accelerate research problem solving in the HBP Forum, we start with Kaplan-Meier non-parametric survival estimates to visually represent differences in problem solving associated with a few key factors. The Kaplan-Meier survival estimates represent the probability of an event occurring after a certain point in time . Fig 3(a) suggests that HBP Forum administrators solve programming-related questions faster than users. However, the difference in problem solving time for non-programming related topics is minimal between administrators and users. The sharp drop in the share of unsolved posts to 25% within the first 25 days suggests that 75% of questions posted, programming-related or not, are solved by then . Fig 3(b) suggests that solving time does not differ much between posts initiated by users affiliated with an HBP partner institution or not, which reflects our results regarding the usage analysis in Section 3.1.
To systematically understand factors associated with online problem solving, we use a Cox proportional hazard model to analyze solving time (in days) of the question raised in post i. For the general model, we apply the following functional form: where h 0 (t) indicates the baseline hazard. s i indicates whether the questions raised in post i is solved by the end of our sample period. d i is the number of days between when question i is raised and solved on the HBP Forum. X i contains an indicator of the relevance of a topic to the HBP platforms, as well as the set of time-invariant covariates about the post-level characteristics that we used in the previous analyses (section 3.1). The coefficient estimates represent hazard ratios, and a ratio greater (smaller) than one indicates a positive (negative) relationship with the probability of a post being solved (i.e., with s i equals one). As a robustness check, we also conducted a time-to-event analysis that allows for a more flexible functional form of the baseline hazard by transforming the survival function using natural cubic splines as a link function [37]. Columns (1)-(4) of Table 3 report the results of the Cox proportional hazard model. Column (1) includes the post profile covariates, column (2) the initial post characteristics, and column (3) a combination of both. Column (4) replaces the capture-all HBP Forum indicator with content-based topic category fixed effects. Column (5) shows the results of the time-toevent analysis using the same covariates as in Column (3). The results in columns (1)-(3) suggest that posts related to HBP platforms have a higher probability of being solved compared to non-platform related posts. The coefficient magnitudes are similar across specifications, and indicate that solving probability is on average twice as high for platform-related posts (e.g., Column (3): 1.956-1 = 0.956). The estimate is similar and larger in magnitude in the time-toevent analysis.
When administrators participate, the probability that a question posted on the Forum is solved is 28-33% (e.g., Column (3) 1.302-1 = 0.302) higher in the proportional Cox hazard model, and 37% higher in the time-to-event analysis. Combined with previous results that higher level of administrator participation associates with lower voluntary user interaction, our findings suggest that support from administrators solve questions effectively faster. In contrast to the wholly user-driven Stack Overflow experience [33], institutional support by the administrators appears to be very beneficial for Forum users. Questions on programming  issues with code and related to HBP platforms are also solved faster. In column (4), the estimates on the categories "Brain Simulation and Modelling" and "Neurorobotics" re-enforce the results that platform-related posts and application questions are solved faster. Consistent with our usage analysis, neither the gender of the user raising the question nor that of those who reply is statistical-significantly associated with lower solving time. We find no evidence for the differences in online knowledge sharing between women and men as observed in other studies [38]. The solving probability is also not affected if a higher share of senior researchers participate.
In the previous analysis, larger geographical user diversity is associated with richer user interaction. The results in Table 3, however, suggest a 21-23% lower probability of the post being solved (e.g., Column (3): 0.794-1 = -0.206) as the number of user-participating countries increase. The results are similar using the Cox proportional hazard model and time-to-event analysis. This result could imply that more time is needed in the accumulation and harmonization of knowledge from various sources, and that questions attracting more geographically diverse user participation may be more complex. In the absence of a clear metric for question complexity, we leave a fuller investigation of this for future research.

Additional discussions
In addition to providing an online community for neuroscientists, the HBP Forum also promotes open-source programming for back-end activities. While the HBP Forum provides direct user support on all HBP-related topics, some questions about the neurorobotics platform require further technical support and may be forwarded to an administrator-only repository-the neurorobotics Jira BitBucket, where more substantial issues and bugs are tracked and resolved. This repository is maintained by 14 out of the 50 HBP Forum administrators in our sample. In our sample, 18 out of 534 posts were sent to Jira BitBucket. In a robustness test, we control for whether a post was forwarded to BitBucket in the regressions to proxy for complexity. Our results remain unchanged, as the set of questions forwarded outside the Forum is small. In addition, most user interactions are voluntary instead of directed communications between users with prior ties: in only 25 posts we observe direct user tagging for additional support, and only 14 users were ever tagged directly (among which, nine were administrators).
Our result on female participation is in line with results of gender studies in online platform collaboration. Female users are in general under-represented in online technology-related Q&A platforms [38]. Studies show that female users are more inclined to participate in posts if there are already other female users replying to a question [35,36]. Similar to Stack Overflow, the gender and affiliation of active users are not revealed in the user profile and as such, users are left to guess gender [35,38]. This gender-neutral feature of platform design is worth further investigation and can potentially reduce gender-related biases.
Forum posts generate more discussions on topics related to the HBP platforms, on questions related to programming, and when the topic attracts more interest from users based in different countries. Most questions posted on the HBP Forum are solved within 16 days, and questions are solved faster when forum administrators participate and when code snippets are included. The Forum appears to be an inclusive online community, where the usage and discussions do not significantly differ across HBP affiliation. We find no evidence that the gender or seniority level of users alter the discussion intensity or problem-solving probability.
Our results provide encouraging evidence that the online community built through the HBP has generated active participation among users from different institutions and with different educational levels who may not have otherwise connected. The institutional support provided by forum administrators appears helpful in supporting the collaborative progress of the online neuroscience community, which may be especially important at the current time, when physical distance to peers is increased. Our analyses offer a first glimpse into the facets of a particular online collaboration infrastructure within a large, long-term life science project.